ARIMA Models

Lecture 8 · Autoregressive Integrated Moving Average

What makes ARIMA different from exponential smoothing?

ARIMA models describe a series using its own past values and past forecast errors.

ETS models forecast by decomposing a series into level, trend, and seasonality components. ARIMA takes a different approach: it asks how the current value of a series is related to its own past values and to past shocks.

ARIMA stands for AutoRegressive Integrated Moving Average. Three elements:

AR(p) — the series is regressed on its own p past values.
I(d) — the series is differenced d times to achieve stationarity.
MA(q) — the model includes q past forecast errors (shocks).

A model with all three is written ARIMA(p, d, q).

What is stationarity, and why does ARIMA require it?

A stationary series has constant mean, variance, and autocorrelation over time.

ARIMA’s AR and MA components are only well-defined for stationary series — a series where the statistical properties do not change with time. Most economic and business time series are not stationary: they have trends, growing variance, or both.

Signs of non-stationarity:

A clear upward or downward trend (mean is not constant).
Variance that grows with the level (e.g., percentage fluctuations on a trending series).
ACF that decays very slowly rather than cutting off quickly.
A unit root test (KPSS or ADF) that rejects stationarity.

The fix: differencing. Taking the first difference y′_t = y_t − y_t−1 removes a linear trend. The number of differences needed is the d in ARIMA.

Differencing transforms a non-stationary series into a stationary one.

First difference (d = 1) — removes a linear trend:

y′_t = y_t − y_t−1

Second difference (d = 2) — removes a quadratic trend (rare in practice):

y′′_t = y′_t − y′_t−1

Seasonal difference — removes a seasonal pattern with period m:

y′_t = y_t − y_t−m

Over-differencing (too many differences) makes the series harder to model — use unit root tests to guide the choice.
The KPSS test has H₀: stationary; the ADF test has H₀: unit root (non-stationary).
In fpp3, unitroot_ndiffs() suggests the number of regular differences; unitroot_nsdiffs() for seasonal.

An AR(p) model regresses the series on its own p lagged values.

AR(1):

y_t = c + φ₁y_t−1 + ε_t

AR(p):

y_t = c + φ₁y_t−1 + φ₂y_t−2 + … + φ_py_t−p + ε_t

The φ coefficients are estimated from data and measure how much each lag predicts the present value.
For stationarity, the AR coefficients must satisfy a constraint (roughly: |φ| < 1 for AR(1)).
White noise is AR(0). A random walk is AR(1) with φ₁ = 1 — non-stationary.
The PACF (partial autocorrelation function) cuts off after lag p for a pure AR(p) process.

An MA(q) model regresses the series on its own q past forecast errors.

MA(1):

y_t = c + ε_t + θ₁ε_t−1

MA(q):

y_t = c + ε_t + θ₁ε_t−1 + θ₂ε_t−2 + … + θ_qε_t−q

The errors ε_t are the one-step forecast errors (shocks) from the model itself — they are not directly observed.
An MA model says: today’s value is driven partly by today’s shock and partly by recent memory of past shocks.
The ACF cuts off after lag q for a pure MA(q) process.
White noise is MA(0). SES is equivalent to ARIMA(0,1,1) — an integrated MA(1).

How do we use the ACF and PACF to identify the right model order?

ACF and PACF patterns identify the AR and MA orders.

Model	ACF	PACF
AR(p)	Tails off (exponential decay or damped oscillation)	Cuts off after lag p
MA(q)	Cuts off after lag q	Tails off
ARMA(p,q)	Tails off after lag q−p	Tails off after lag p−q
White noise	No significant spikes	No significant spikes

In practice: identifying exact orders from plots is difficult and subjective. AICc-based automatic selection (via ARIMA() in fpp3) is more reliable for most applications.

Significant spikes at seasonal lags (e.g., lag 12 for monthly data) signal the need for seasonal AR or MA terms.

ARIMA(p, d, q) combines differencing, autoregression, and moving average terms.

After differencing d times to produce a stationary series y′_t, the full ARIMA model is:

y′_t = c + φ₁y′_t−1 + … + φ_py′_t−p + θ₁ε_t−1 + … + θ_qε_t−q + ε_t

Parameter	Meaning	Typical range
p	Number of AR lags	0–5
d	Degree of differencing	0, 1, or 2
q	Number of MA lags	0–5

Special cases: ARIMA(0,0,0) = white noise; ARIMA(0,1,0) = random walk; ARIMA(0,1,1) = SES; ARIMA(1,1,0) = differenced AR(1).

Seasonal ARIMA adds seasonal AR, differencing, and MA terms for periodic patterns.

A full seasonal ARIMA model is written ARIMA(p,d,q)(P,D,Q)_m, where m is the seasonal period (e.g., 12 for monthly, 4 for quarterly).

(p,d,q) — non-seasonal AR, differencing, and MA orders (at individual lags).
(P,D,Q)_m — seasonal AR, differencing, and MA at multiples of m.
D = 1 — one seasonal difference: y′_t = y_t − y_t−m, removes annual seasonality.

Classic example: ARIMA(0,1,1)(0,1,1)₁₂ — the “airline model” (Box & Jenkins, 1976) — used for monthly passenger data with both a trend and seasonal pattern. One regular difference + one seasonal difference + MA(1) terms at both scales.

AICc selects the best ARIMA order by penalizing complexity.

With ARIMA models, the number of possible (p, d, q) combinations is large. Rather than comparing all of them manually via ACF/PACF plots, we minimize the corrected Akaike Information Criterion:

AICc = AIC + 2k(k+1)/(T−k−1)

where k is the number of estimated parameters and T is the sample size.

AICc rewards goodness of fit and penalizes extra parameters — avoiding overfit.
AICc is preferred over AIC when the sample size is small relative to the number of parameters.
BIC imposes a heavier penalty and tends to select more parsimonious models.
Do not compare AICc values across models that use different values of d (different amounts of differencing).

fpp3’s ARIMA() function selects the best model automatically.

The ARIMA() function in fpp3 implements a stepwise search procedure (similar to the classic auto.arima() from the forecast package) that:

Determines d (and D) using unit root tests.
Searches over candidate (p, q) orders and selects by AICc.
Returns the model with the best penalized fit.

Basic fpp3 workflow:

          fit <- tsbl |> model(ARIMA(y))

          report(fit)

          gg_tsresiduals(fit)

          fit |> forecast(h = 24) |> autoplot(tsbl)

You can also specify the order manually: ARIMA(y ~ pdq(1,1,1) + PDQ(0,1,1)) overrides the automatic search.

Reading ARIMA output from R.

Model specification line

e.g., ARIMA(1,1,1)(0,1,1)[12] — non-seasonal AR(1), one difference, MA(1); seasonal MA(1) with period 12.

Coefficients table

ar1, ma1, sma1 — estimated AR, MA, and seasonal MA coefficients.
Standard errors in parentheses; significance assessed by t-ratio (coeff / s.e. > 2 ⇒ significant).

Information criteria

AIC, AICc, BIC reported — lower is better; compare only models with the same d.

sigma² (variance of residuals)

Estimate of the white noise variance — used to compute prediction intervals.
Prediction intervals for ARIMA widen with horizon (unlike ETS, no component-based structure).

A well-specified ARIMA model leaves white noise residuals.

Use gg_tsresiduals(fit) in fpp3 to produce a three-panel diagnostic plot:

Residual time plot — should look random with constant variance, no obvious patterns.
ACF of residuals — all bars should be within the 95% bounds (±1.96/√T). Significant spikes indicate missed autocorrelation.
Histogram of residuals — should be roughly normal and centered on zero.

Formal test: Ljung-Box Q* statistic tests the null that all autocorrelations up to lag l are zero. In fpp3: augment(fit) |> features(.innov, ljung_box, lag=24). A large p-value (>0.05) means residuals are consistent with white noise.

If residuals are not white noise, the model is mis-specified — try different (p, q) orders or add seasonal terms.

ARIMA forecasts are computed recursively from the estimated model.

To forecast h steps ahead, substitute in the estimated AR and MA coefficients. Unknown future errors are set to zero; past errors come from the fitted model. The h-step-ahead point forecast is:

ŷ_T+h|T = c + φ₁ŷ_T+h−1|T + … + φ_pŷ_T+h−p|T + θ₁ε_T+h−1 + …

Prediction intervals are computed from the variance of the h-step forecast error, which grows with the horizon. Under normality, the 95% PI is:

ŷ_T+h|T ± 1.96 ⋅ σ_h

Forecasts from integrated (d ≥ 1) models return to the level of the original series because differences are “undone” by cumulative summation.

Strengths and weaknesses of ARIMA models.

Strengths

Flexible: can model a wide variety of autocorrelation structures.
Statistically principled: based on rigorous probability theory with well-understood properties.
Automatic selection with ARIMA() removes subjectivity from order identification.
Handles non-stationarity explicitly through differencing.
Provides valid prediction intervals under Gaussian errors.

Weaknesses

Requires stationarity (differencing may discard useful long-run information).
Less intuitive than decomposition or ETS — harder to explain to non-technical audiences.
Does not naturally incorporate explanatory variables (though ARIMA(y ~ x) in fpp3 allows regressors).
Performs poorly on series with complex, irregular seasonality.
Needs a reasonably long history to estimate parameters reliably.

When should you use ARIMA vs. ETS?

They are complementary, not competing

Both are automatic, data-driven, and provide prediction intervals.
Neither consistently outperforms the other across all series types.
The best practice is to fit both and compare on a held-out test set.

Favor ETS when…

The series has a clear level, trend, and/or seasonal structure.
You want intuitive, interpretable components (e.g., “the trend is growing at 3 units/month”).
The series is short — ETS often needs fewer observations to estimate well.

Favor ARIMA when…

The series has complex autocorrelation not captured by level/trend/season.
You want to include exogenous regressors alongside time-series dynamics.
Residual diagnostics from ETS show remaining autocorrelation.

ARIMA models forecast by exploiting the autocorrelation structure of a stationary series.

The three-step ARIMA workflow: (1) make the series stationary via differencing, (2) identify or automatically select AR and MA orders, (3) estimate, diagnose, and forecast.